welfare function
- Asia > Middle East > Jordan (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Research Report > Experimental Study (0.68)
- Research Report > New Finding (0.46)
- South America > Argentina > Patagonia > Río Negro Province > Viedma (0.04)
- North America > United States (0.04)
- Europe > France (0.04)
- Asia > Middle East > Jordan (0.04)
- Media (0.45)
- Leisure & Entertainment (0.45)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Information Management (0.67)
- Europe > Switzerland > Zürich > Zürich (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom (0.46)
- Asia > Middle East > Jordan (0.04)
- Research Report > New Finding (0.68)
- Research Report > Experimental Study (0.68)
- Government (0.94)
- Health & Medicine > Consumer Health (0.46)
- Asia > Middle East > Jordan (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Research Report > Experimental Study (0.68)
- Research Report > New Finding (0.46)
- South America > Argentina > Patagonia > Río Negro Province > Viedma (0.04)
- North America > United States (0.04)
- Europe > France (0.04)
- Asia > Middle East > Jordan (0.04)
- Media (0.45)
- Leisure & Entertainment (0.45)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Information Management (0.67)
Remember, but also, Forget: Bridging Myopic and Perfect Recall Fairness with Past-Discounting
Dynamic resource allocation in multi-agent settings often requires balancing efficiency with fairness over time--a challenge inadequately addressed by conventional, myopic fairness measures. Motivated by behavioral insights that human judgments of fairness evolve with temporal distance, we introduce a novel framework for temporal fairness that incorporates past-discounting mechanisms. By applying a tunable discount factor to historical utilities, our approach interpolates between instantaneous and perfect-recall fairness, thereby capturing both immediate outcomes and long-term equity considerations. Beyond aligning more closely with human perceptions of fairness, this past-discounting method ensures that the augmented state space remains bounded, significantly improving computational tractability in sequential decision-making settings. We detail the formulation of discounted-recall fairness in both additive and averaged utility contexts, illustrate its benefits through practical examples, and discuss its implications for designing balanced, scalable resource allocation strategies.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > Missouri (0.05)
Comparing Targeting Strategies for Maximizing Social Welfare with Limited Resources
Machine learning is increasingly used to select which individuals receive limited-resource interventions in domains such as human services, education, development, and more. However, it is often not apparent what the right quantity is for models to predict. In particular, policymakers rarely have access to data from a randomized controlled trial (RCT) that would enable accurate estimates of treatment effects -- which individuals would benefit more from the intervention. Observational data is more likely to be available, creating a substantial risk of bias in treatment effect estimates. Practitioners instead commonly use a technique termed "risk-based targeting" where the model is just used to predict each individual's status quo outcome (an easier, non-causal task). Those with higher predicted risk are offered treatment. There is currently almost no empirical evidence to inform which choices lead to the most effect machine learning-informed targeting strategies in social domains. In this work, we use data from 5 real-world RCTs in a variety of domains to empirically assess such choices. We find that risk-based targeting is almost always inferior to targeting based on even biased estimates of treatment effects. Moreover, these results hold even when the policymaker has strong normative preferences for assisting higher-risk individuals. Our results imply that, despite the widespread use of risk prediction models in applied settings, practitioners may be better off incorporating even weak evidence about heterogeneous causal effects to inform targeting.
- North America > United States > Tennessee (0.05)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Asia > India (0.04)
- (3 more...)
- Research Report > Strength High (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine (1.00)
- Government (1.00)
- Education > Educational Setting (0.68)
Balancing Act: Prioritization Strategies for LLM-Designed Restless Bandit Rewards
Verma, Shresth, Boehmer, Niclas, Kong, Lingkai, Tambe, Milind
LLMs are increasingly used to design reward functions based on human preferences in Reinforcement Learning (RL). We focus on LLM-designed rewards for Restless Multi-Armed Bandits, a framework for allocating limited resources among agents. In applications such as public health, this approach empowers grassroots health workers to tailor automated allocation decisions to community needs. In the presence of multiple agents, altering the reward function based on human preferences can impact subpopulations very differently, leading to complex tradeoffs and a multi-objective resource allocation problem. We are the first to present a principled method termed Social Choice Language Model for dealing with these tradeoffs for LLM-designed rewards for multiagent planners in general and restless bandits in particular. The novel part of our model is a transparent and configurable selection component, called an adjudicator, external to the LLM that controls complex tradeoffs via a user-selected social welfare function. Our experiments demonstrate that our model reliably selects more effective, aligned, and balanced reward functions compared to purely LLM-based approaches.